I’m assuming you have a basic understanding of antigenic cartography. If not, you can learn more here
library(ggplot2)
library(Racmacs)
options(RacOptimizer.num_cores = parallel::detectCores())
First we need to read some data in. The way to read data in depends on the format.
data1 <- read.csv("data/small-dataset.csv", row.names=1)
# replace "/" in sera names
colnames(data1) <- gsub(".", "/", colnames(data1), fixed=T)
map <- acmap(titer_table = data1)
You don’t always get the same map as the starting coordinates are random - that’s why we run many optimisations. Before each map-making, I’m setting a random seed so the answer is reproducible.
ran_seed <- 5387
You can make the map directly from the table or from the acmap object we just created. These methods are effectively equivalent.
set.seed(ran_seed)
map_from_table <- make.acmap(titer_table = data1, number_of_dimensions = 2, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'EN/7/94'
## 'FI/339/95'
## Took 0.14 secs
## Warning in optimizeMap(map = map, number_of_dimensions = number_of_dimensions,
## : There is some variation (4.18 AU for one point) in the top runs, this may be
## an indication that more optimization runs could help achieve a better optimum.
## If this still fails to help see ?unstableMaps for further possible causes.
set.seed(ran_seed)
map_from_map <- optimizeMap(map = map, number_of_dimensions = 2, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'EN/7/94'
## 'FI/339/95'
## Took 0.1 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2,
## number_of_optimizations = 100): There is some variation (4.18 AU for one point)
## in the top runs, this may be an indication that more optimization runs could
## help achieve a better optimum. If this still fails to help see ?unstableMaps
## for further possible causes.
If you a sensible reason, you can fix the column base:
set.seed(ran_seed)
map_fixed_col_base <- optimizeMap(map = map, number_of_dimensions = 2, fixed_column_bases = rep(8, 21), number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'EN/7/94'
## 'FI/339/95'
## Took 0.47 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2, fixed_column_bases
## = rep(8, : There is some variation (5.03 AU for one point) in the top runs,
## this may be an indication that more optimization runs could help achieve a
## better optimum. If this still fails to help see ?unstableMaps for further
## possible causes.
Or set a minimum column base:
set.seed(ran_seed)
map_min_col_base <- optimizeMap(map = map, number_of_dimensions = 2, minimum_column_basis = "1280", number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'EN/7/94'
## 'FI/339/95'
## Took 0.15 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 2,
## minimum_column_basis = "1280", : There is some variation (4.26 AU for one
## point) in the top runs, this may be an indication that more optimization runs
## could help achieve a better optimum. If this still fails to help see
## ?unstableMaps for further possible causes.
You can also make a map in 3D. We will check later if this is a good idea.
set.seed(ran_seed)
map_3d <- optimizeMap(map = map, number_of_dimensions = 3, number_of_optimizations = 100)
## a
## Warning: The following ANTIGENS are too underconstrained to position in 3 dimensions and coordinates have been set to NaN:
##
## 'EN/7/94'
## 'FI/339/95'
## Warning: The following ANTIGENS have do not have enough titrations to position in 3 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'MA/G252/93'
## 'NL/372/93'
## 'NL/399/93'
## 'FI/338/95'
## 'FI/381/95'
## 'HK/49/95'
## 'NL/271/95'
## 'HK/357/96'
## 'HK/358/96'
## 'LY/1781/96'
## 'JO/10/97'
## 'OS/21/97'
## 'OS/244/97'
## Took 0.22 secs
## Warning in optimizeMap(map = map, number_of_dimensions = 3,
## number_of_optimizations = 100): There is some variation (3.79 AU for one point)
## in the top runs, this may be an indication that more optimization runs could
## help achieve a better optimum. If this still fails to help see ?unstableMaps
## for further possible causes.
save.acmap(map_from_map, "out/map_from_map.ace")
Remember when merging tables that names & IDs need to be identical
First, open the file with the table to be merged.
data2 <- read.csv("data/small-dataset-2.csv", row.names=1)
# replace "/" in sera names
colnames(data2) <- gsub(".", "/", colnames(data2), fixed=T)
map2 <- acmap(titer_table= data2)
The simplest method is to merge table together. You can also give the function a list of maps to merge.
merge_table <- mergeMaps(map, map2, method="table")
## a
## Warning: The 'conservative' titer merge method was used when merging titers, which differs to the 'likelihood' method employed in older Racmacs versions. You can suppress this warning by setting the merge method explicitly with "merge_options = list(method = 'conservative')")
## This warning is displayed once every 8 hours.
merge_map <- optimizeMap(merge_table, 2, 100)
## a
## Warning: The following ANTIGENS have do not have enough titrations to position in 2 dimensions. Coordinates were still optimized but positions will be unreliable
##
## 'EN/7/94'
## 'FI/339/95'
## Took 0.11 secs
## Warning in optimizeMap(merge_table, 2, 100): There is some variation (4.57 AU
## for one point) in the top runs, this may be an indication that more
## optimization runs could help achieve a better optimum. If this still fails to
## help see ?unstableMaps for further possible causes.
If you already have a map, you can merge your new data in either by using the original map as a starting point, or freezing the original map and then merging new data in (this can be useful when you are merging mutants in and want to keep the background map consistent). I won’t demonstrate the frozen merge here as the two datasets completely overlap.
merge_incremental <- mergeMaps(map_from_map, map2, method="incremental-merge", number_of_dimensions = 2, number_of_optimizations = 100)
## a
There are some additional methods described here that may be applicable to your research.
plot(map_from_map, plot_stress=T)
ggplot(map_from_map, plot_stress=T)
view(map_from_map)
Largely copied from here and there is also a youtube tutorial
Sequence data is added as a matrix, so if you start with a fasta file you will need to do a little pre-processing (aligning and trimming your sequences & then changing from a list to a matrix).
ag_sequences <- read.csv(
file = system.file("extdata/h3map2004_sequences_ag.csv", package = "Racmacs"),
colClasses = "character",
row.names = 1
)
sr_sequences <- read.csv(
file = system.file("extdata/h3map2004_sequences_sr.csv", package = "Racmacs"),
colClasses = "character",
row.names = 1
)
agSequences(map_from_map) <- ag_sequences[agNames(map_from_map),]
srSequences(map_from_map) <- sr_sequences[srNames(map_from_map),]
Color, shape, size, outline colour and outline width can be customised for antigens and sera. A few examples below. You can find more detail on styling maps here
# color by year
yr <- as.numeric(paste0(c(rep("19", 36), rep("20", 13)), sapply(strsplit(agNames(map_from_map), split="/", fixed=T), "[[", 3)))
agFill(map_from_map) <- rainbow(12)[yr-1992]
view(map_from_map)
agFill(map_from_map) <- "grey60"
# color by genetics
agFill(map_from_map)[agSequences(map_from_map)[,156]=="K"] <- "forestgreen"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="Q"] <- "skyblue"
agFill(map_from_map)[agSequences(map_from_map)[,156]=="H"] <- "gold"
agSize(map_from_map)[c("PM/2007/99", "SY/5/97", "NA/933/95", "WU/359/95")] <- 10
agShape(map_from_map)[c(36, 29, 11, 13)] <- "EGG"
srShape(map_from_map)[c("PM/2007/99", "SY/5A/97", "SY/5B/97", "SY/5HAY/97", "SY/5V/97", "NA/933/95","WU/359B/95")] <- "UGLYEGG"
view(map_from_map)
The stress is the measure of how well the data fits the map (lower stress is better). It can be calculated for the whole map or per antigen & per serum
mapStress(map_from_map)
## [1] 151.5363
hist(agStress(map_from_map))
hist(srStress(map_from_map))
hist(agStressPerTiter(map_from_map))
hist(srStressPerTiter(map_from_map))
ams <- allMapStresses(map_from_map)
head(ams)
## [1] 151.5363 153.0233 155.2882 155.9242 156.1003 156.1613
plot(ams)
If the top values of the stress are not similar, then the optimisation to make the map has not converged and either more optimisations are required or there are other issues with your data.
We can also compare stress between different merging methods
mapStress(merge_map)
## [1] 155.7719
mapStress(merge_incremental)
## [1] 154.5767
view(map_from_map)
From the “Coloring” tab you can colour by stress; higher stress points are ones where the antigenic map fits the data less well.
From the “Diagnostics” tab, you can select an antigen and turn on
We use procrustes to compare two maps with antigens & sera in common.
First, we need to get another map to compare with (in that I need to remoce the antigen & serum IDs from the 2004 map so that the algorithm matches by name).
map_2004 <- read.acmap("data/seq-t9a-mod27.ace")
srIDs(map_2004) <- rep("", numSera(map_2004))
agIDs(map_2004) <- rep("", numAntigens(map_2004))
map_from_map <- realignMap(map_from_map, map_2004)
pc <- procrustesMap(map_from_map, map_2004)
view(pc)
pc_data <- procrustesData(map_from_map, map_2004)
names(pc_data)
## [1] "ag_dists" "sr_dists" "ag_rmsd" "sr_rmsd" "total_rmsd"
pc_data$total_rmsd
## [1] 1.264435
You can see another example of procrustes, including 3D procrustes, here
We can also perform a procrustes to the other optimisations and plot this.
pc_rmsd <- pc_rmedsd <- rep(NA, numOptimizations(map_from_map))
for (i in 1:numOptimizations(map_from_map)){
pc_output <- procrustesData(map_from_map, map_from_map, 1, i)
pc_rmsd[i] <- pc_output$total_rmsd
pc_rmedsd[i] <- sqrt(median(c(pc_output$ag_dists, pc_output$sr_dists)^2))
}
plot(pc_rmsd)
plot(ams, pc_rmsd)
plot(pc_rmedsd)
plot(ams, pc_rmedsd)
plot(pc_rmsd, pc_rmedsd, asp=1)
abline(0,1)
Ideally there will be a low procrustes distance between the top optimisations. If maps with similar stress have high procrustes distances, then the map is metastable. The RMSD (root mean square deviation) is affected by outliers such as a small number of antigens or sera hemisphering so the RMedSD (root median square deviation) can be a more robust measure.
Point uncertainty can be visualised using bootsrapping where data is randomly excluded and the map re-made. Here I’ve used the default parameters of 1000 repeats and 100 optimisations per repeat. I’ve commented out the code as it might take a while to run on your machine - you can simply load the completed map for plotting.
There’s more detail on the bootstrapping methods here
# set.seed(ran_seed)
# bs_map <-bootstrapMap(map_from_map, method = "resample")
# save.acmap(bs_map, "out/bs_map.ace")
# blob95 <- bootstrapBlobs(bs_map, , conf.level=0.95)
# blob68 <- bootstrapBlobs(bs_map)
# blob10 <- bootstrapBlobs(bs_map, conf.level=0.1)
# save.acmap(blob95, "out/blob95_map.ace")
# save.acmap(blob68, "out/blob68_map.ace")
# save.acmap(blob10, "out/blob10_map.ace")
blob68 <- read.acmap("out/blob68_map.ace")
plot(blob68, plot_stress=T)
view(blob68)
blob10 <- read.acmap("out/blob10_map.ace")
plot(blob10, plot_stress=T)
view(blob10)
Firstly, we can exclude the antigens that are uncoordinated and giving the first warning.
map_from_map_rm <- removeAntigens(map_from_map, c('EN/7/94', 'FI/339/95'))
set.seed(ran_seed)
map_from_map_rm <- optimizeMap(map = map_from_map_rm, number_of_dimensions = 2, number_of_optimizations = 100)
## Discarding previous optimization runs.
## a
## Took 0.23 secs
## Warning in optimizeMap(map = map_from_map_rm, number_of_dimensions = 2, : There
## is some variation (2.42 AU for one point) in the top runs, this may be an
## indication that more optimization runs could help achieve a better optimum. If
## this still fails to help see ?unstableMaps for further possible causes.
pc <- procrustesMap(map_from_map_rm, map_from_map_rm, 1, 2)
view(pc)
Then we can remove other antigens that are potentially poorly coordinated
hist(rowSums(data1!="*"), xlab="Number of measured data points per antigen", breaks=1:20)
map_from_map_rm2 <- removeAntigens(map_from_map, which(rowSums(data1!="*")<4))
set.seed(ran_seed)
# map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)
map_from_map_rm2 <- removeSera(map_from_map_rm2, 'MA/G252/93')
set.seed(ran_seed)
map_from_map_rm2 <- optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, number_of_optimizations = 100)
## Discarding previous optimization runs.
## a
## Took 0.08 secs
## Warning in optimizeMap(map = map_from_map_rm2, number_of_dimensions = 2, :
## There is some variation (2.86 AU for one point) in the top runs, this may be an
## indication that more optimization runs could help achieve a better optimum. If
## this still fails to help see ?unstableMaps for further possible causes.
pc <- procrustesMap(map_from_map_rm2, map_from_map_rm2, 1, 2)
view(pc)
We still have an unstable map. This is probably due to the limitations of the data and if this was a real experiment, we would consider making further measurments.
You can check how well different numbers of dimensions represent the underlying data. The stress will usually decrease when increasing the number of dimensions.
mapStress(map_from_map)
## [1] 151.5363
mapStress(map_3d)
## [1] 130.9359
By removing data, we can test how well this removed data is predicted by maps with differing numbers of dimensions (cross-validation). The default setting test 1-5 dimesions, removing 10% of the data, 1000 optimisation and 100 replicates per dimension. Here I’ve saved the output of the code as it takes a while to run.
# dim_test <- dimensionTestMap(map_from_map)
# save(dim_test, file="out/dim_test.RData")
load("out/dim_test.RData")
dim_test
## dimensions mean_rmse_detectable var_rmse_detectable mean_rmse_nondetectable
## 1 1 1.389009 0.04285565 1.221791
## 2 2 1.229333 0.11286930 1.167984
## 3 3 1.255879 0.15759603 1.290677
## 4 4 1.253939 0.19611440 1.320766
## 5 5 1.286569 0.33971046 1.328394
## var_rmse_nondetectable replicates
## 1 0.1651072 100
## 2 0.1506297 100
## 3 0.1439530 100
## 4 0.1370128 100
## 5 0.1315355 100
#plotting for detectable titres
ci <- 1.96*sqrt(dim_test$var_rmse_detectable/100)
plot(dim_test$dimensions, dim_test$mean_rmse_detectable, ylim=range(c(dim_test$mean_rmse_detectable+ci, dim_test$mean_rmse_detectable-ci)))
arrows(dim_test$dimensions, dim_test$mean_rmse_detectable+ci, dim_test$dimensions, dim_test$mean_rmse_detectable-ci, length=0.1, angle=90, code=3)
The minimum is at 2 dimensions, suggesting this is best for this dataset. Sometimes you may want to use fewer dimensions for interpretability if there is marginal benefit with higher dimensions.
dist_mat <- as.matrix(dist(agCoords(map_from_map)))
If they are sufficiently similar, they can be merged.
Map cohesion calculates the vertex connectivity of the map (the minimum number of point that need to be removed to eliminate all paths from one point to another). If you have n overlapping sera & antigens between merged tables, then the vertex connectivity will be 2n (assuming that non of these titres are undetectable). A vertex connectivity of “number of map dimensions + 1” is required for map stability. Therefore, 3 overlapping antigens and sera (vertex connectivity = 6) should be sufficient if the map has 3 dimensions (minimum vertex connectivity = 4).
# original map
mapCohesion(map_from_map)
## [1] 2
# map with uncoordinated antigens removed
mapCohesion(map_from_map_rm)
## [1] 3
# map with potentialy poorly coordinated antigens removed
mapCohesion(map_from_map_rm2)
## [1] 5